Generalized super-resolution 4D Flow MRI-using ensemble learning to extend across the cardiovascular system

4D Flow Magnetic Resonance Imaging (4D Flow MRI) is a non-invasive measurement technique capable of quantifying blood flow across the cardiovascular system. While practical use is limited by spatial resolution and image noise, incorporation of trained super-resolution (SR) networks has potential to enhance image quality post-scan. However, these efforts have predominantly been restricted to narrowly defined cardiovascular domains, with limited exploration of how SR performance extends across the cardiovascular system; a task aggravated by contrasting hemodynamic conditions apparent across the cardiovasculature. The aim of our study was to explore the generalizability of SR 4D Flow MRI using a combination of heterogeneous training sets and dedicated ensemble learning. With synthetic training data generated across three disparate domains (cardiac, aortic, cerebrovascular), varying convolutional base and ensemble learners were evaluated as a function of domain and architecture, quantifying performance on both in-silico and acquired in-vivo data from the same three domains. Results show that both bagging and stacking ensembling enhance SR performance across domains, accurately predicting high-resolution velocities from low-resolution input data in-silico. Likewise, optimized networks successfully recover native resolution velocities from downsampled in-vivo data, as well as show qualitative potential in generating denoised SR-images from clinicallevel input data. In conclusion, our work presents a viable approach for generalized SR 4D Flow MRI, with ensemble learning extending utility across various clinical areas of interest.

changes in blood flow, velocity, and pressure all indicative of disease onset and progression across the entire cardiovascular system [1].Amongst a range of available techniques, timeresolved three-dimensional phase-contrast magnetic resonance imaging -more commonly 4D Flow MRI -has emerged as one of the most promising imaging techniques, allowing for the non-invasive capture of full-field hemodynamics [2].The impact of this technique has also been exemplified across various cardiovascular application areas, from the heart [3] and aorta [4], to the brain [5].In clinical practice, the use of 4D Flow MRI is still limited by effective spatiotemporal resolution, with acquired voxel size being in direct trade-off with effective signal-to-noise ratio and required scan time.Further, with accurate image-based quantification of velocity [6], flow [7], and pressure [8], all directly dependent on acquired resolution there remains a definitive need for effective approaches to achieve high-resolution 4D Flow MRI in order to extend use across a wider spectrum of cardiovascular application areas.
To address the need for improved resolution, novel acquisition protocols or high-Tesla systems have been proposed [9], [10], however, are limited to pre-defined systems.Image-based computational fluid dynamics (CFD) provide an avenue for unrestricted resolution [11], however, require high-performance computational resources and well-defined model geometry and boundary conditions.Alternatively, deep learning methods have been proposed to enable super-resolution image conversion post acquisition, with networks developed in nonmedical settings now entering the field of medical imaging.For anatomical super-resolution MRI, deep convolutional networks have been proposed across various application areas [12], [13], and novel generative adversarial [14] or attention networks [15] have also been introduced.For super-resolution 4D Flow MRI, residual networks have shown particular promise [16]- [18], and together with recent examples including unsupervised [19] or physics-informed neural networks (PINNs) [20], this all highlights the increasing interest in using deep learning to enhance the quality of clinically acquired flow data.
Despite this increasing interest, networks have almost exclusively been trained and tested on isolated, pre-defined cardiovascular compartments: a number of studies targeting cerebrovascular flow enhancement using cerebrovascular input data [17]; others using aortic input data to enhance aortic flow capture [16], [21]; and for super-resolution PINNs, retraining is so far required whenever transferring to a new anatomy [20].In the setting of supervised networks, this is a particular constraint where performance will be directly dependent on required similarity between training and testing data.To exemplify, Ferdian et al. showed how application of an aortic network in a cerebrovascular setting yielded distinctive prediction biases, necessitating domain-specific training data whenever applied on novel domains [17].Shit et al. [18] utilized training data from mulitple flow compartments, however, generalizability of super-resolution networks into unseen cardiovascular domains remains an unassessed problem, not least considering the contrasting hemodynamic conditions present across the cardiovascular system.
The issue of generalizability is, however, an area of active research.Data-centric approaches including data augmentation or cross-validation are commonly employed, and transfer learning strategies are tailored to improve performance beyond a pre-defined training domain [22].Amongst available approaches, ensemble learning has emerged as an area of particular promise, where multiple base learners are combined in a meta approach to improve performance beyond that of any singular input network [23].Crucially, ensemble learning has shown specific potential to improve out-of-distribution generalization through combination of heterogeneous base learners: either by varying training data, or by varying base architectures.While successfully employed for non-medical super-resolution imaging [24], ensemble strategies have yet to be explored for super-resolution 4D Flow MRI.
The aim of this study is therefore to evaluate the utility of ensemble learning in the setting of super-resolution 4D Flow MRI, focusing on the ability to generalize performance across multiple cardiovascular domains.Using the existing superresolution network 4DFlowNet [16] as a base framework, and utilizing synthetic and clinically acquired 4D Flow MRI data from various cardiovascular compartments for training, testing, and validation, our contributions lie in (1) quantifying the limitations in generalizability of base learners trained on isolated cardiovascular compartments; (2) assessing the performance gain of various ensemble learning setups for improving super-resolution performance across disparate cardiovascular domains; and (3) translating utilities into a direct clinical setting, paving the way for super-resolution 4D Flow MRI in a more direct, cardiovascular practice.

A. Models and data preparation
1) Patient-specific cardiovascular models: To train a supervised super-resolution network, coupled sets of low and highresolution images need to be acquired.In practice, collecting such paired data is inherently difficult, not least considering that high-resolution data suitable for training would require virtually noise-free, artifact-free input, acquired at resolutions beyond clinical routine.As an alternative, synthetic 4D Flow MRI originating from patient-specific CFD models have been successfully utilized [16]- [18], allowing for input data at unrestricted spatiotemporal sampling.
For the purpose of assessing generalizability, we utilize anatomically accurate patient-specific CFD models from three different cardiovascular compartments: the heart, the aorta, and the cerebrovasculature.These were purposely chosen to represent domains of disparate hemodynamic nature, ranging from high-velocity aortic jets to slow diastolic flows traversing narrow cerebrovascular arteries.With modelling details described in separate work [11], [16], [25], below follows a brief overview of utilized models: Cardiac: Patient-specific models of the left heart including left atrium, left ventricle, and left ventricular outflow tract were utilized from four (n=4) different subjects, each with varying degrees of simulated mitral regurgitation (one grade 1, two grade 2, and one grade 4).Models based on medical input data were calibrated and simulated as described in Bonini et al. [25].
Aortic: Patient-specific models of the thoracic aorta were utilized from three (n=3) different subjects: one without any vascular disease; two with coarcted narrowings just distal to the left subclavian artery.Data was extracted from the aortic root to a distal part of the descending aorta.Models were identical to the ones simulated and used for super-resolution training in Ferdian et al. [16].
Cerebrovascular: Patient-specific models of the arterial cerebrovasculature were utilized from four (n=4) different subjects: one without any cerebrovascular disease, one with severe stenosis in the right proximal internal carotid artery (ICA); one with bilateral carotid stenosis; and one being the bilaterial stenosis case after surgial re-opening of the right proximal ICA.Models were identical to the ones used for super-resolution training in Ferdian et al. [17], with modelling details provided in preceeding work [11], [17].Additionally, in order to assess network performance in an unseen domain, a fourth model compartment was also defined: Aortic dissection: Patient-specific CFD modelling was performed on one (n=1) subject with a medically managed type B aortic dissection, exhibiting a primary entry and exit tear with no septal fenestrations in the thoracic segment.
Imaging data was extracted for the entire thoracic type B aorta, covering the aortic root, branching into false and true lumen, and cutting the model at a distal descending end at around diaphragm level.Modelling was performed for this study, although CFD details follow equivalent steps presented in similar, previous work [26].The aortic dissection model was purposely selected to represent not only a domain withheld from training, but a domain of highly complex hemodynamic nature.
2) Synthetic image generation: To allow for clinically relevant training data, nodal CFD data was converted into pairs of synthetic 4D Flow MRI using a pipeline described in Ferdian et al. [16], [17].In brief, CFD output was sampled onto uniform voxelized image grids, with noise-free high resolution data generated at spatial samplings of dx = 0.5, 0.75, 1, and 1.5 mm isotropic, respectively.To create low resolution equivalents mimicking acquired 4D Flow MRI data, high resolution data was downsampled through appropriate k-space cropping along with the addition of zero-mean Gaussian noise in the complex signal.In our work, high:low resolution pairs were created at a factor of 1:2.Complementing the synthetic Fig. 1.Overview of the baseline network architectures, with a number of base learners drawing from pooled data before being ensembled through either bagging or stacking approaches.The base learner architecture 4DFlowNet (presented elsewhere [16] is shown in brevity on the top right).
phase data, synthetic magnitude images were generated from the corresponding fluid region segmentations, obtained from the CFD output.
3) Training patches and data augmentation: To generate a larger number of training sets, the voxelized representations were split into 3D patches of 12 3 voxels throughout the selected field-of-view, enforcing each patch to contain a minimum of 5% non-stationary voxels.With each time frame treated independently, data heterogeneity was introduced by varying velocity encoding (VENC) across the cardiac cycle, leading to varying SNR in subsequent data patches (note that VENC was consistently kept above the maximum velocity to avoid aliasing).Patch-based data augmentation was introduced by rigid Cartesian rotation (90/180/270°) to avoid directional bias.
Through the above, a total of 13900, 21300, and 30846 patches were created for the cardiac, aortic, and cerebrovascular models, with a data split of 6:2:2 between training, validation, and testing.Note that data was partitioned modelwise rather than sample-wise to maintain integrity and independence of data during training and evaluation.

B. Network setups
To systematically assess the impact of ensemble learning on super-resolution performance, a variety of network setups were evaluated: 1) Baseline super-resolution network: As a basis for comparison, the residual network 4DFlowNet [16] served as a baseline framework (see architecture in Figure 1, top right).With the network previously published and validated across various isolated domains [16], [17], it utilizes two core input paths including 3D image patches of the assessed anatomy (magnitude) and velocity (phase) for all Cartesian velocity directions.Once fed into the network, data passes through stacked convolutional and residual blocks including a core upsampling layer, before generating output in the form of super-resolved velocity patches in each Cartesian velocity direction (v x , v y , v z ).
2) Isolated models: To serve as a baseline for how networks trained on isolated cardiovascular domain perform, three 4DFlowNet networks were trained with data coming from the compartments described in Section II-A.1.This resulted in so called isolated networks trained only on cardiac data (4DFlowNet-Cardiac), aortic data (4DFlowNet-Aorta), or cerebrovascular data (4DFlowNet-Cerebro).
3) Combined baseline model: Advancing from the isolated models a combined baseline model was created, maintaining the 4DFlowNet architecture; however, merging datasets from all models into one.To facilitate for imbalance between compartment data, a loss function weighting scheme was introduced, balancing compartment influence on a per-batch level (see Section II-B.5).
4) Ensemble models: Moving beyond input data variations, two general ensemble learning approaches were explored (see Figure 1): Bagging: Being one of the most common ensemble strategies, bagging consists in fitting several base models on different bootstrap samples, before aggregating them.Here, bagging was implemented using singular 4DFlowNet models as base models, with training samples drawn randomly from the available training data.Throughout, replacement sampling was allowed with base learner sample size N equal to that of the original training set.A soft voting ensemble was utilized, invoking average weighting of single models in fusion prediction.Stacking: Representing a second family of ensemble approaches, stacking uses a trained meta-learner as fusion of input base models.Base learners are again represented by singular 4DFlowNet models trained on sub-samples of all available training data.For the fusion meta-learner, we employed a single 8-layer convolutional feed-forward network, with input and output identical to that of 4DFlowNet.5) Loss function: The optimization target was defined by a velocity data matching term, l M SE , given as: with N being the total number of voxels in a given patch.To compensate for fluid/non-fluid imbalances, the loss function was split as per: with λ = 5 • 10 − 7 introduced on network weights w i .w c was introduced as a compartment weight, compensating for imbalances between different training compartments by: with N c the number of compartments, S c the number of samples of compartment i in the assessed batch, and K i = 1 Si .6) Training: All networks were implemented using Tensorflow 2.6.0 [27] with a Keras backend [28].The Adam optimizer was used with an initial learning rate of 10 −4 and a learning rate decay of √ 2. Training was performed on a two NVIDIA A100 Tensor Core GPUs.With base and meta models trained for 60 and 80 epochs, respectively, this rendered a total training time of 10-15 hours for the non-ensemble, and 20-25 hours hours for the ensemble networks, respectively.Complete setup and trained weights are publicly available at https:// github.com/LeonEricsson/Ensemble4DFlowNet.

C. Performance evaluation 1) Parametric in-silico validation and quantitative accuracy
assessment: To validate network performance, synthetic 4D Flow MRI data from Section II-A.1 was utilized comparing high-resolution velocities to super-resolved equivalents.Focusing on domain generalization, performance was consistently evaluated on cardiac, aortic, and cerebrovascular test cases along with overall average performance.With evaluation metrics defined in Section II-C.4,below follows a brief overview of the parametric assessments performed: Baseline vs. ensemble: To provide an estimate of ensemble potential, ensemble models were initially compared against isolated and combined baseline models.Serving as a first benchmark for ensemble performance, evaluation was performed for bagging and stacking networks consisting of two homogeneous base learners.

Number of base learners:
To assess how ensemble performance scaled with the number of input base models, ensemble models created from an increasing number of base learners were evaluated (ranging from 2 to 12).Compartmentalized vs. Non-compartmentalized: To quantify how variations in base learning training data influenced performance, ensemble networks consisting of base learners sampling from a single (compartmentalized) vs. a pooled (non-compartmentalized) domain of training data were compared.Models were defined with three homogeneous base learners.
Architectural heterogeneity: To assess how heterogeneity in base learner architecture influenced performance, bagging and stacking models built from three homogeneous base learners were compared to models built on three heterogeneous base learners, where heterogeneity was introduced by replacing residual blocks with corresponding dense or cross stage partial blocks (similar to [21]).
2) Quantifying generalizability into an unseen domain: Seeking to quantify network generalizability in out-of-domain settings, ensemble networks were also evaluated on synthetic 4D Flow MRI from the unseen aortic dissection domain.Consistently, performance of the best performing networks from Section Section II-C.1 were compared against isolated and combined baseline models.
3) In-vivo verification and clinical potential: To translate the in-silico results into an in-vivo setting, network performance was evaluated on 4D Flow MRI acquired with research sequences.Data was retrospectively assembled from both thoracic (n=5) and cerebrovascular (n=5) subjects, respectively, with specific scan parameters provided in Table I.All clinical acquisitions followed institutional review board (IRB) approval, with patients referred for MRI either based on clinical indication (thoracic) or research-based study inclusion (cerebrovascular).
In lack of high-resolution reference data, we opted for downsampling acquired clinical data, assessing how superresolution networks can recover initial native resolution.For this, clinical data was downsampled by a factor of two through k-space truncation (identical to Section II-A.2).Using our proposed baseline and ensemble networks, recovered superresolution velocity fields were compared to the natively acquired input data, evaluating performance within left ventricular, aortic, or cerebrovascular flow domains, respectively.
4) Evaluation metrics: To measure network performance, relative speed error, RE, was defined as: with V and V ′ being reference and predicted velocities, and with ϵ = 10 −4 introduced to avoid zero-division.tanh was introduced to mitigate over-penalizing low velocities.
Beyond the relative metric above, root mean square errors (RMSE) were estimated across the entire fluid and non-fluid domain.To quantify possible estimation bias, linear regression analysis was performed for all super-resolved networks, defining linear regression slopes, k, and coefficient of determination, R 2 , for each Cartesian velocity direction, respectively.

III. RESULTS
A. Parametric in-silico validation and quantitative accuracy assessment 1) Baseline vs. ensemble: Qualitative comparison between isolated, combined baseline, and two ensemble models is presented in Figure 2. As apparent, distinct noise reduction is achieved by virtually all networks, albeit with visual artifacts when transferring isolated base models into unseen domains.II presents summarized error metrics.Overall, isolated models exhibit optimal performance in the domain in which they had been trained, with poor translation into unseen domains.The combined baseline model showed apparent improvement as compared to the isolated models across all domains, with a relative error decrease of 1.77, 2.16, and 2.07% in the cardiac, aortic, and cerebrovascular domains, respectively.Underestimation bias was also mitigated by the combined baseline model, with  Number of base learners: Keeping all base learners identical, bagging scaled with the number of base learners with performance peaking at 12 base learners (average RE = 22.01%, mean RMSE = 1.17 cm/s).In contrast, stacking displays inverse behaviour, with accuracy decreasing with increasing number of homogenous base learners (RE = 22.14%, given at two base learner).This holds true also for bias metrics from the linear regression analysis.Comparing the two approaches, the best bagging vs. stacking approach are seemingly interchangeable, with strong correlations and low errors observed across all domains (18 vs. 22 metrics perform better in bagging vs. stacking across all domains)

Compartmentalized vs non-compartmentalized:
As given in the middle section of Table III, stacking is able to fuse compartmentalized base learners better than bagging, with an average relative error of 23.90% vs. 35.01%.As compared to other permutations, compartmentalized ensemble models consistently underperform as compared to noncompartmentalized equivalents.This holds across all metrics, with bagging particularly suffering from compartmentalized learners (relative errors > 30%).

Architectural heterogeneity:
The bottom part of Table III provides results for bagging and stacking containing base learners with varying architectural blocks.The given stacking permutation (Stacking Blocks-3) shows the best overall performance of all stacking variations (average relative error of 21.48%, average RM SE = 1.08 cm/s, k = 0.933, and R 2 = 0.933).Bagging on the other hand does not show the same benefit of architectural heterogeneity, where instead a maximized number of input learners (Bagging-12) is the model with optimal performance across all permutations.

B. Quantifying generalizability into an unseen domain
Table IV provides evaluation metrics for the unseen aortic dissection, with Bagging-12 and Stacking Blocks-3 used as optimal ensemble models.As observed, isolated models exhibit significant difficulties translating into an unseen domain, with the cerebrovascular network having particularly poor performance (relative error = 75.57%,average RMSE = 60.37 cm/s).In comparison, ensemble methods exhibit high accuracy across all metrics with relative error = 25.42 and 24.82%, and average RMSE = 2.63 and 2.17 cm/s given for bagging and stacking, respectively.Concerning estimation bias, combined baseline, bagging, and stacking all show highly accurate behaviour, exhibiting high accuracy and low spread (linear regression data shown in Figure 4).Further, qualitative renderings of recovered flow features are shown in Figure 3.

C. In-vivo verification and clinical potential
1) Quantitative assessment through recovery of native resolution: Figure 5 shows exemplary in-vivo images, using superresolution to recover native input resolution.Qualitatively, both ensemble networks recover high-resolution features along with background noise suppression.Behaviour also seem robust across all domains, with both large-vessel aortic and small-vessel cerebrovascular features captured.Quantifying the above, summarized linear regressions statics are provided in Table V. Consistently, relative errors are lower using ensemble techinques, with Bagging-12 indicating optimal performance across all domains (average relative error = 39.85%).Conversely, bias metrics show slight favouring of the baseline combined approach, with an average k = 0.954 compared to 0.873 and 0.753 for bagging and stacking, respectively.However, regression spread is lower with ensemble techniques, with bagging exhibiting maximum specificity (average R 2 = 0.815 vs. 0.796 and 0.786 for combined baseline and stacking, respectively).

IV. DISCUSSION
In this study, we have evaluated the utility of ensemble learning for super-resolution 4D Flow MRI, assessing its ability to generalize across various cardiovascular domains.As reported, ensembling along with incorporation of disparate training data distinctly improves domain generalization, with recovery of high-resolution velocities validated on both synthetic and clinical datasets across the heart, aorta, and brain.Considering the disparate hemodynamic conditions apparent across the cardiovasculature, our results thus bear particular clinical promise, opening up for generalizable super-resolution performance across domains using a single network setup.

A. Base vs. ensemble learning
As observed across all synthetic datasets, ensemble approaches consistently outperform isolated base learners (Table II).Notably, the benefit is not only observed when moving outside an isolated model's domain-of-training, but benefits are seen even within the setting of an isolated learner.These results not only speak to the benefit of ensemble approaches [29] but also highlights limitations in utilizing isolated training beds with a limited number of patient sets.In our work, isolated networks are trained on ∼20'000 patches: a figure comparable to what has been previously utilized for medical super-resolution [16]- [18], however, small in contrast to nonmedical equivalent.Increasing training data is a common strategy for improved performance, but here our work highlight the benefit of doing so using data from various compartments with the combined baseline model outperforming isolated learners.Adding ensemble strategies can further improve performance, enabling optimal weighting between individual learners.
The benefit of ensembling and data pooling is emphasized when transitioning into the unseen aortic dissection where all isolated models show significant errors.Poor domain generalization has been reported for networks trained on singledomain data [29], however, our results corroborate this in the setting of 4D Flow MRI.Moving into an unseen domain also highlights the benefits of combining learners, with ensemble networks improving on the combined baseline model.This is a particularly important feature in seeking generalizable performance, where data heterogeneity is observed both between domains and patients.The use of ensemble approaches thus opens for more unified analysis, super-resolving datasets at maintained accuracy in a diverse clinical reality.

B. Parametric ensemble evaluation
In an attempt to optimize ensemble performance, a range of networks were assessed in Section III-A.Although variations were overall minor, a few notable trends can be observed: First, the number of base learners had opposite effects on the two assessed approaches: bagging improving but stacking worsening with an increased number of base learners.For bagging, being a mere deterministic aggregation of base learners, bias and variance is typically reported to decrease with number of base learners, leading to an accuracy plateau at an empirically determined base learner density [30].The metaarchitecture of stacking, on the other hand, does not scale with base learner quantity but rather with base learner diversity [31]; a fact corroborated by the results on architectural variations.
Second, the use of compartmentalized base learners had a consistently detrimental effect on overall performance.The reason to this most likely lies in the underperformance of our isolated base learners, where ensemble combinations alone cannot overcome the bias exhibited by the base learners themselves.Bagging suffers particularly from compartmentalized base learners, where deterministic weight averaging renders pronounced errors across all domains.These results are contradictory to the notion that input diversity is viewed as one way of improving ensemble performance [23], [29], however, this does not necessarily cover scenarios where base learners are extended into out-of-distribution settings.
Third, architectural variations were beneficial for the stacking setup, with optimal performance given for the Stacking Blocks-3 network.In our study, architectural variation was not achieved by replacing overall architecture, but by replacing internal layers similar to how 4DFlowNet has been altered in previous work [21].Higher-degrees of heterogeneity could be offered by combining super-resolution networks of different core architectures, being as of yet unexplored for superresolution 4D Flow MRI.

C. In-vivo feasibility and clinical utility
To explore clinical translation, ensemble networks were assessed in-vivo, recovering native resolution from synthetically downsampled data.As reported, performance is kept stable across domains, although biases and errors are more pronounced as compared to the in-silico results.Here, comparisons between in-silico and in-vivo results should be viewed in light of the inherent differences between the datasets.In the in-vivo setting, 4DFlowNet is actually not trying to recover native input images directly, but rather a de-noised equivalent.As such, increased in-vivo errors do not necessarily stem from sub-optimal network performance, but also from differences between noisy native, and de-noised recovered images.
As a final note on clinical utility, it is worth highlighting that our networks are directly applicable for true super-resolution image conversion.To exemplify, Figure 6 showcases two such qualitative examples, indicating how both intracardiac vortices and cerebrovascular flow features can be resolved at beyond clinical resolution, all using a single ensemble network.

D. Scientific contextualization
Whereas, to the best of our knowledge, no previous work have attempted ensembling techniques to super-resolve 4D Flow MRI data, or explored generalizability of superresolution 4D Flow MRI, it is worth contrasting our results to previously published work within related spaces.In the non-medical field, ensemble learning has been reported as one of the more promising domain generalization approaches [29].Ju et al. evaluated bagging and stacking of residual learners for image classification, reporting incremental performance improvement in-line with our findings [32].Similarly, Nguyen et al. [33] utilized stacking of heterogeneous learners, reporting slight improvement as compared to single base learners.For improved super-resolution generalization, examples include exploration of heterogeneous training data [34], or leveraging domain-specific image priors [35].The latter presents an appealing approach for unifying behaviour across e.g.vendors or centres; however, for the sake of generalizability across flow domains, hemodynamic differences are inherent to the physiological nature of the observed domain.
In a medical setting, Lyu et al. [13] presented one of few examples using ensemble learning for super-resolution MRI.Using generative adversarial networks in a stacking setup they highlighted the ability to super-resolve anatomical MRI, however, focusing on a single anatomical domain.For 4D Flow MRI, Shit et al. [18] trained on both thoracic and cerebrovascular data, using transfer learning to translate in-silico results to in-vivo.Although a direct comparison is obstructed by differences in available datasets, our reported ensemble output (RMSE ∼1-2 cm/s) appear non-inferior in comparison across all domains (RMSE ∼2-4 cm/s).Beyond this, recent PINN work [20] promises increased super-resolution accuracy, but their utility in an ensemble setting remains to be assessed.

E. Limitations and future work
A few limitations are worth pointing out.First, training was performed on synthetic 4D Flow MRI without inclusion of acquired in-vivo data.Acquiring clinical data for superresolution purposes is difficult due to practical considerations (scan time, SNR), not least considering the notion of going beyond practical resolution limits.The use of k-space data conversion is instead purposely introduced to mitigate the effect of in-silico-to-in-vivo discrepancies, resembling the subsampling of an MR scanner.
Second, although tested with respect to recovery of native resolution, no in-vivo comparison was performed between acquired high-and acquired low-resolution data.This again comes down to the problem of acquiring paired high-and low-resolution data.The concept of recovering downsampled data has been explored by others in previous super-resolution work [18], highlighting the practicality of the approach.
For future work, a number of directions can be envisioned including exploration of diverse base learners, or incorporation of clinical training data.Efforts to integrate superresolution algorithms as an in-line scanner utility would also greatly improve use-cases.Nevertheless, our data highlights how ensemble techniques could help generalize the use of super-resolution imaging, circumventing the need for purposebuilt networks and opening for wider incorporation of superresolution imaging in cardiovascular 4D Flow MRI work.

V. CONCLUSION
In this study, we have shown how ensemble learning enables super-resolution conversion of clinically acquired 4D Flow MRI, with accurate performance generalizing across disparate flow domains.Using a combination of synthetic training data from different cardiovascular compartments, we have shown how ensemble approaches maintain accurate performance across unseen domains, as well as improve on singular base learner performance.Satisfactory recovery of native resolution in-vivo also highlights performance transfer into a direct patient setting, applicable across the heart, aorta, and brain.

ACKNOWLEDGMENT
Computations were performed on resources provided by the National Academic Infrastructure for Supercomputing in Sweden at the National Supercomputer Centre at Linköping University (Berzelius).We also thank Ning Jin, PhD, at Siemens Medical Solutions USA, Inc., and Daniel Giese, PhD, at Siemens Healthcare GmbH, Erlangen, Germany for providing the thoracic 4D Flow research sequences.

Fig. 3 .
Fig. 3. Qualitative visualization of super-resolution conversion of the unseen aortic dissection domain (left, with the stacking setup representing the super-resolution conversion), along with representative cross-sections (right).All renderings are performed using calculated velocity magnitudes.

Fig. 4 .
Fig. 4. Linear regression plots in the unseen aortic dissection, given for isolated baseline (top row) and combined baseline, bagging, and stacking (bottom row) learners, respectively, showing velocities in x, y, and z from left to right, all normalized to a [-1,1] range.

Fig. 6 .
Fig. 6.Qualitative vector field rendering of clinical data upsampled by a factor two beyond native acquisition resolution.Examples shown for both cardiac (top) and cerebrovascular (bottom) data, for both bagging (middle) and stacking (right).

TABLE I ACQUISITION
PARAMETERS FOR THE UTILIZED in-vivo DATA.

TABLE II ESTIMATED
EVALUATION METRICS ACROSS ISOLATED, COMBINED BASELINE AND ENSEMBLE MODELS WITH TWO BASE LEARNERS EACH.
Each metric's best value is highlighted in bold.Arrows indicate direction of improvement.RMSE given in cm/s.RMSE, k and R 2 given by (vx,vy ,vz ).

TABLE III EVALUATION
METRICS FOR ENSEMBLE METHOD PERMUTATIONS INCLUDING NUMBER OF BASE LEARNERS (TOP PART), COMPARTMENTALIZED VS.NON-COMPARTMENTALIZED BASE LEARNERS (MIDDLE PART), AND BASE LEARNERS OF VARYING ARCHITECTURES (BOTTOM PART)

TABLE IV PREDICTION
ERRORS OF ISOLATED, COMBINED AND ENSEMBLE MODELS ON THE UNSEEN AORTIC DISSECTION DATA.Note: Each metric's best value is highlighted in bold font.Arrows indicate direction of improvement.RMSE given in cm/s.RMSE, k, and R 2 given by (vx, vy , vz )